gof test
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.46)
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- Asia > Middle East > Jordan (0.04)
On the Empirical Power of Goodness-of-Fit Tests in Watermark Detection
He, Weiqing, Li, Xiang, Shang, Tianqi, Shen, Li, Su, Weijie, Long, Qi
Large language models (LLMs) raise concerns about content authenticity and integrity because they can generate human-like text at scale. Text watermarks, which embed detectable statistical signals into generated text, offer a provable way to verify content origin. Many detection methods rely on pivotal statistics that are i.i.d. under human-written text, making goodness-of-fit (GoF) tests a natural tool for watermark detection. However, GoF tests remain largely underexplored in this setting. In this paper, we systematically evaluate eight GoF tests across three popular watermarking schemes, using three open-source LLMs, two datasets, various generation temperatures, and multiple post-editing methods. We find that general GoF tests can improve both the detection power and robustness of watermark detectors. Notably, we observe that text repetition, common in low-temperature settings, gives GoF tests a unique advantage not exploited by existing methods. Our results highlight that classic GoF tests are a simple yet powerful and underused tool for watermark detection in LLMs.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Virginia > Arlington County (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Sports > Basketball (0.68)
find our responses to your comments below. 2 Reviewer # 1: 3 Thank you for the positive comments on the novelty of our idea and insightful questions for further improvement
We would like to thank all three reviewers for acknowledging our contributions and providing valuable feedback. Thank you for the positive comments on the novelty of our idea and insightful questions for further improvement. We first characterize the solutions of DEAN. Other choices for the energy objective will be left to future works. Rejection rates for d = 1 (left) and d = 3 .
Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing
Ayers, Jacob Glenn, Ramanan, Buvaneswari A., Khan, Manzoor A.
As the adoption of deep learning models has grown beyond human capacity for verification, meta-algorithms are needed to ensure reliable model inference. Concept drift detection is a field dedicated to identifying statistical shifts that is underutilized in monitoring neural networks that may encounter inference data with distributional characteristics diverging from their training data. Given the wide variety of model architectures, applications, and datasets, it is important that concept drift detection algorithms are adaptable to different inference scenarios. In this paper, we introduce an application of the $χ^2$ Goodness of Fit Hypothesis Test as a drift detection meta-algorithm applied to a multilayer perceptron, a convolutional neural network, and a transformer trained for machine vision as they are exposed to simulated drift during inference. To that end, we demonstrate how unexpected drops in accuracy due to concept drift can be detected without directly examining the inference outputs. Our approach enhances safety by ensuring models are continually evaluated for reliability across varying conditions.
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Using Perturbation to Improve Goodness-of-Fit Tests based on Kernelized Stein Discrepancy
Liu, Xing, Duncan, Andrew B., Gandy, Axel
Kernelized Stein discrepancy (KSD) is a score-based discrepancy widely used in goodness-of-fit tests. It can be applied even when the target distribution has an unknown normalising factor, such as in Bayesian analysis. We show theoretically and empirically that the KSD test can suffer from low power when the target and the alternative distributions have the same well-separated modes but differ in mixing proportions. We propose to perturb the observed sample via Markov transition kernels, with respect to which the target distribution is invariant. This allows us to then employ the KSD test on the perturbed sample. We provide numerical evidence that with suitably chosen transition kernels the proposed approach can lead to substantially higher power than the KSD test.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (8 more...)
Neural Stein critics with staged $L^2$-regularization
Repasky, Matthew, Cheng, Xiuyuan, Xie, Yao
Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large, and training on $n$ samples converge at a rate of ${O}(n^{-1/2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.
Auto-Encoding Goodness of Fit
Palmer, Aaron, Chi, Zhiyi, Aguiar, Derek, Bi, Jinbo
For generative autoencoders to learn a meaningful latent representation for data generation, a careful balance must be achieved between reconstruction error and how close the distribution in the latent space is to the prior. However, this balance is challenging to achieve due to a lack of criteria that work both at the mini-batch (local) and aggregated posterior (global) level. Goodness of fit (GoF) hypothesis tests provide a measure of statistical indistinguishability between the latent distribution and a target distribution class. In this work, we develop the Goodness of Fit Autoencoder (GoFAE), which incorporates hypothesis tests at two levels. At the mini-batch level, it uses GoF test statistics as regularization objectives. At a more global level, it selects a regularization coefficient based on higher criticism, i.e., a test on the uniformity of the local GoF p-values. We justify the use of GoF tests by providing a relaxed $L_2$-Wasserstein bound on the distance between the latent distribution and target prior. We propose to use GoF tests and prove that optimization based on these tests can be done with stochastic gradient (SGD) descent on a compact Riemannian manifold. Empirically, we show that our higher criticism parameter selection procedure balances reconstruction and generation using mutual information and uniformity of p-values respectively. Finally, we show that GoFAE achieves comparable FID scores and mean squared errors with competing deep generative models while retaining statistical indistinguishability from Gaussian in the latent space based on a variety of hypothesis tests.
- North America > United States > Connecticut (0.04)
- North America > United States > New York (0.04)
- Europe > Germany > Berlin (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Robust and efficient change point detection using novel multivariate rank-energy GoF test
Masud, Shoaib Bin, Aeron, Shuchin
In this paper, we use and further develop upon a recently proposed multivariate, distribution-free Goodness-of-Fit (GoF) test based on the theory of Optimal Transport (OT) called the Rank Energy (RE) [1], for non-parametric and unsupervised Change Point Detection (CPD) in multivariate time series data. We show that directly using RE leads to high sensitivity to very small changes in distributions (causing high false alarms) and it requires large sample complexity and huge computational cost. To alleviate these drawbacks, we propose a new GoF test statistic called as soft-Rank Energy (sRE) that is based on entropy regularized OT and employ it towards CPD. We discuss the advantages of using sRE over RE and demonstrate that the proposed sRE based CPD outperforms all the existing methods in terms of Area Under the Curve (AUC) and F1-score on real and synthetic data sets.